Development of a Hindi Lemmatizer

نویسندگان

  • Snigdha Paul
  • Nisheeth Joshi
  • Iti Mathur
چکیده

We live in a translingual society, in order to communicate with people from different parts of the world we need to have an expertise in their respective languages. Learning all these languages is not at all possible; therefore we need a mechanism which can do this task for us. Machine translators have emerged as a tool which can perform this task. In order to develop a machine translator we need to develop several different rules. The very first module that comes in machine translation pipeline is morphological analysis. Stemming and lemmatization comes under morphological analysis. In this paper we have created a lemmatizer which generates rules for removing the affixes along with the addition of rules for creating a proper root word. Keywords— lemmatizer, lemmatization, inflectional, derivational.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IndiLem@FIRE-MET-2014 : An Unsupervised Lemmatizer for Indian Languages

An unsupervised and language independent lemmatization procedure has been developed for major Indian languages (Bengali, Hindi etc) which are morphologically very rich and agglutinative in nature. The task of a lemmatizer is mapping an inflected surface word to its appropriate dictionary root word and it is a pre-requisite for implementing several NLP tools like Word Sense Disambiguation system...

متن کامل

Design of a Rule Based Hindi Lemmatizer

Stemming is the process of clipping off the affixes from the input word to obtain the respective root word, but it is not necessary that stemming provide us the genuine and meaningful root word. To overcome this problem we come up with a solutionLemmatizer. It is the process by which we crave out the lemma from the given word and can also add additional rules to make the clipped word a proper s...

متن کامل

Using a Lemmatizer to Support the Development and Validation of the Greek WordNet

In this paper we aim to give a description of the computational tools that have been designed and implemented to support the development and validation process of the Greek WordNet, which is currently being developed in the framework of the BalkaNet project. In particular, we focus on the description of a lemmatizer for the Greek language, which has been used as the basis for a number of tools ...

متن کامل

تحلیل روش دایره هندی در تعیین جهت قبله مساجد (نمونه‌موردی: مسجد جامع اصفهان)

Kaaba known as Qibla and the focus point of Muslim people is of paramount importance. Many scientists in the field of mathematics, astronomy and geography throughout the Islamic world tried to find the exact methods and procedures to determine the direction of Qibla. One of these methods is Hindi Circle used to determine Qibla of mosques. Most often the scientists, scholars and astronomers gath...

متن کامل

A Self-Learning Context-Aware Lemmatizer for German

Accurate lemmatization of German nouns mandates the use of a lexicon. Comprehensive lexicons, however, are expensive to build and maintain. We present a selflearning lemmatizer capable of automatically creating a full-form lexicon by processing German documents.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1305.6211  شماره 

صفحات  -

تاریخ انتشار 2013